DECchip 21064-AA 
Microprocessor 


Product Brief 


July, 1992 








ig it fa il Features 

Full 64-bit Alpha architecture: ¢ On-chip pipelined floating point 

- Advanced RISC architecture Sst 

- Optimized for high perform- © *-* 8&bytedata cache 
ance implementations e 8K byte instruction cache 

- Multiprocessor support e External cache memory support: 

- IEEE single and double - On-chip extemal secondary 
precision, VAX F_ floating cache control | 
and G_floating, longword - Programmable cache size and 
and quadword data types speed 

- Cycle counter for code ¢ On-chip demand paged memory 
optimization management unit: 

Privileged Architecture Library - 12 entry I-stream TB with 8 

Code (PALcode) supports: entries for 8K byte pages and 

- Optimization for multiple 4 entries for 4M byte pages 
operating systems - 32 entry D-stream TB with 

each entry able to map 8K, 


- Flexible memory manage- 
ment implementations 
- Mult-instructon atomic 


64K, 512K, or 4M byte pages 
On-chip parity and ECC genera- 
tors and checkers 


sequences | 

Ultra-high performance Alpha e Internal clock generator provides: 

implementation: - High-speed chip clock 

- Dual-pipelined architecture - Pair of programmable system 

- 150 MHz cycle time clocks (CPU/2 to CPU/8) 

- Peak instruction execution of ¢ Programmable on-chip perform- 
300 million operations per ance counters measure CPU and 
second system performance 

On-chip wnite bufier with four e Chip and module level test 

32-byte entries support 

Selectable data bus width and ¢ 3.3-volt supply voltage 

speed: - Lower power 

- 64 or 128 bit data width - Higher rehability 


- 75 MHz to 18.75 MHz bus 
speed 


- Interface to 5-volt logic 


Description 


Digital’s DECchip 21064-AA microprocessor is the first in a family of chips to 
implement Digital’s Alpha architecture. The DECchip 21064-AA microproces- 
sor is a .75 micron CMOS based super-scalar super-pipelined processor using 
dual instruction issue and a 150 MHz cycle tme. The Alpha architecture is a 64- 
bit RISC architecture designed with particular emphasis on speed, multiple in- 
struction issue, multiple processors, and software migration from VAX/VMS and 
MIPS/ULTRIX operating environments. “ 


DECchip 21064-AA 
MicroArchitecture 


The DECchip 21064-AA microproc- 
essor consists of four independent 
functional units: the integer execu- 
tion unit (Ebox), floating point unit 
(Fbox), the load/store or address unit 
(Abox) and the branch unit. Other 
sections include the central control 
unit (Ibox) and the I and D cache. 


Ebox - Contains a 64-bit fully 
pipelined integer execution data path 
including: adder, logic box, barrel 
shifter, byte extract and mask, and 
independent integer multiplier. The 
Ebox also contains a 32-entry 64-bit 
integer register file. 


Fbox - Contains a fully pipelined 
floating point unit and independent 
divider, supporting both IEEE and 
VAX floating point data types. 
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IEEE single precision and double 
precision floating point data types 
are supported. VAX F_floating 
and G_floating data types are fully 


supported withiimited support for 


the D_floating data type. 


Abox - Contains five major sections: 
address translation data path, load 
silo, write buffer, data cache 
(Dcache) interface, and the external 
bus interface unit (BIU). 


The Abox supports all integer and 
floating point load and store instruc- 
tions, including address calculation 
and translation, and cache control 
logic. 


Ibox - Performs instruction fetch, 
resource checks, and dual instruction 
issue to the Ebox, Abox, Fbox, or 
branch unit. In addition, the Ibox 
controls pipeline stalls, aborts and 
restarts. 


Data Bus (128 bits) 


Pipeline Organization 


The DECchip 21064-AA microproc- 
essor uses a seven stage pipeline for 


"—-Inteper operate-ancdmem ory-tefer- - - 


ence instructions, and a ten stage 
pipeline for floating point operate 
instructions. The [box maintains 
State for all pipeline stages to track 
outstanding register writes. 


Cache Organization 


The DECchip 21064-AA microproc- 
eSSOr contains two on-chip caches, 
data cache (Dcache) and instruction 
cache (Icache). The chip also sup- 
ports an external cache. 


Deache - Contains 8K bytes and is a 
wiite through, direct mapped, read- 
allocate physical cache with 32-byte 
blocks. 


Icache - Contains 8K bytes and is a 
physical direct-mapped cache with 
32-byte blocks. 


Characteristics 


Power Supply 

Operating Temperature (with proper 

heatsink and airflow) 

Storage Temperature Range 

Power Dissipation @ Vdd = 3.45V 
Speed = 6.6 ns 


External Cache - The DECchip 
21064-AA supports external cache 
built from off-the-shelf static 
RAMs. The DECchip 21064-AA di- 


*--erectly:controls the RAMs - using its 


programmable external cache inter- 
face, allowing each implementation 
to make its own external cache speed 
and configuration trade-offs. 


The external cache interface supports 
cache sizes from 0 to 8M bytes and 
a range of operating speeds which 
are sub-multiples of the chip clock. 


Virtual Address Space 


The virtual address is a 64-bit 
unsigned integer that specifies a byte 
location within the virtual address 
space. The DECchip 21064-AA mi- 
croprocessor checks all 64-bits of a 
virtual address and implements a 43- 
bit subset of the address space. The 
DECchip 21064-AA supports a 
physical address space of 16G bytes. 


Vss 0.0 V, Vdd 3.3 V +5% 
0°C to 70°C 


-55°C to 125°C 
23 W typical, 27.5 W maximum 


Alpha Architecture 
Summary 


— The. DECchip.21064-AA microprac- 
essor implements the Alpha architec- 
ture. The Alpha architecture sup- 
ports: 


A fixed 32-bit instruction size 

Separate integer and floating 

point registers 

- 32 64-bit integer registers 

- 32 64-bit floating point 
registers 

32-bit (longword) and 64-bit 

(quadword) integer along with 

32-bit and 64-bit IEEE and VAX . 

floating-point data types 

Memory access using a 64-bit 

virtual byte address 

Privuleged Architecture Library 

Code (PALcode) 


Instruction Set 


Instructions are all 32 bits in length 
using four different instruction for- 
mats specifying 0, 1, 2, or 3 5-bit 
register fields. Each format uses a 6- 
bit opcode. 





CALL_PAL 
Branch 
Memory 


[or [hare Foon [RE] One 


CALL PAL Instructions - vector 
to a privileged library of software 
that atomically performs both 
privileged and unprivile ged 
functions. 


Branch Instructions - Conditional 
branch instructions test a register for 
positive/negative, zero/nonzero, or 
even/odd, and perform a PC relative 


“branch. Unconditional branch 


instructions perform either a PC 
relative or absolute jump using an 
arbitrary 64-bit register value. They 
can update a destination register with 
a return address. 


Load/Store Instructions - can move 
either 32-bit or 64-bit quantities. 
8-bit and 16-bit load/store operations 
are supported through an extensive 
set of in-register byte manipulations. 


Integer Operate Instructions - 
manipulate full 64-bit values, and 
include a full complement of 
arithmetic, compare, logical, and 
shift instructions. In addition there 
are three 32-bit integer operates: add, 
subtract, and multply. 


In addition to the operation of 
conventional RISC architectures, 
the Alpha architecture provides 
scaled add/subtract for quick 
subscript calculation, 128-bit 
muluply for division by a constant 
and multi-precision anthmetic, 
conditional moves for avoiding 
branches, and an extensive set of 
in-register byte manipulation 
instructions. 


Floating-Point Operate Instruc- Memory Management 
tions - include four complete sets of 


instructions for IEEE single, IEEE The Alpha memory management 
double, VAX F_floating and VAX architecture is designed to provide: 
.. -4%floatang-arithmetic. In.addition+to 

arithmetic instructions there are also e A large address space 
instructions for conversions between for instructions and data 
floating and integer values including e Convenient and efficient sharing 
the VAX D_ floating data type. of instructions and data. 

e Independent read and write ac- 
Privileged Architecture Library cess protection 
Code e Flexibility through programma- 

ble PALcode support 

PALcode is a privileged library of 


software that atomically performs 
such functions as the dispatching and 
servicing of interrupts, exceptions, 
task switching, and additional 
privileged and unprivileged user 
instructions as specified by 

operating systems using the 

CALL _PAL instruction. 


PALcode is the only method of 
performing some operations on the 
hardware. In addition to the entre 
instruction set, a set of implementa- 
tion specific instructions is provided. 


PALcode runs in an environment 
with privileges enabled, instruction 
stream mapping disabled, and 
interrupts disabled. Disabling 
memory mapping allows PALcode to 
support functions such as TB miss 
routines. Disabling interrupts allows 
the instruction stream to provide 
multi-instruction sequences as 
atomic operations. 


Alpha Architecture Compared to Conventional RISC Architecture 


The Alpha architecture is different from conventional RISC architectures in a number of ways: 


Feature 


64-Bit Architecture 


High Speed 


Multiprocessor Support 


Multiple Operating 
Systems 


Byte Manipulation 


Arithmetic Traps 





Difference 


True 64-bit architecture with 64-bit data and address. Not a 32-bit architecture that was 
later expanded to 64 bits. 


The Alpha architecture was designed to allow very high-speed implementations. Simple 
instructions make it particularly easy to build implementations that issue multiple 
instructions every CPU cycle. There are no implementation specific pipeline timing 
hazards, no load delay slots, and no branch delay slots. 


The Alpha architecture does not enforce strict read/write ordering between multiple proc- 
essors. This allows multiprocessor implementations to easily use features such as: multi- 
bank caches, bypassed write buffers, write merging, and pipelined wnites with retry on 
error. To maintain strict ordering between accesses as seen by a second processor, 
memory barrier instructions can be explicitly inserted in the program. The basic mult- 
processor interlocking primitive is a RISC style load_locked, modify, store_conditional 
sequence. If the sequence runs without interrupt, exception, or an interfering write from 
another processor, the store succeeds. Otherwise, the store fails and the program 
eventually must branch back and retry the sequence. 


The Alpha architecture provides flexibility by allowing the user to implement a privileged 
library of software for operating system specific operations. This allows Alpha to run full 
VMS using one version of this software library that mirrors many of the VAX operating 
system features, and to run OSF/1 using a different version that mirrors many of the 

MIPS operating system features. Additional operating system implementations can be 
efficiently supported. 


The Alpha architecture is unconventional in the approach to byte manipulation. Byte 
loads, stores, and operations are done with normal 64-bit instructions, crafted to keep the 
sequences short. Single-byte stores found in conventional RISC architectures force cache 
and memory implementations to include hardware byte operations and implement read- 
modify-write cycles which can complicate system design and reduce performance. 


In contrast to conventional RISC architectures, the reporting of Alpha architecture arith- 
metic traps (overflow, underflow, and others) are imprecise. This removes architectural 
bottlenecks that affect performance. If precise anthmetic exceptions are desired, trap bar- 
rier instructions can be explicitly inserted in the program to force traps to be delivered at 
specific points. 


Alpha architecture includes a number of implementation-specific HINTS aimed at allow- 


- ing higher performance. Software is able to.provide HINTS to the hardware that enable 


the hardware to optimize its operation. HINTS can help improve the utilization of the 
pipeline, cache memory, and translation lookaside buffers. 


Signals 





Name Type Function 

adr_h 33:5 Input/Output Address bus 

data_h 127:0 Input/Output Data bus 

check_h 27:0 Input/Output Check bit bus 

dOE_1 Input Data bus output enable 
dWSel_h 1:0 Input Data bus write data select 
dRAck_h 2:0 Input Data bus data acknowledge 
tagCEOE_h Output Extemal cache RAM tagCu, tagAdr CE/OE 
tagCuWE_h Output External cache RAM tagCtl WE 
tagCulV_h Input/Output Tag valid 

tagCuS_h Input/Output Tag shared 

tagCuD_h Input/Output Tag dirty 

tagCuP_h Input/Output Tag V/S/D parity 

tagAdr_h 33:17 = Input Tag address 

tagAdrP_h Input Tag address parity 

tagOK_h, 1 Input Tag access from CPU is ok 
tagEq_1 Output Tag compare output 
dataCEOE_h3:0 Output Extemal cache RAM data CE/OE, longword 
dataWE_h 3:0 Output External cache RAM data WE, longword 
dataA_h 4:3 Output External cache RAM data A 4:3 
holdReq_h Input Hold request 

holdAck_h Output Hold acknowledge 

cReq_h 2:0 Output Cycle request 

cWMask_h7:0 Output Cycle write mask 

cAck_h 2:0 Input Cycle acknowledge 

1Adr_h 12:5 Input Invalidate address, Dcache 
dinvReq_h Input Invalidate request, Dcache 
dMapWE_h Output External Dcache duplicate tag RAM WE 
irg_h 5:0 Input Interrupt request 

sRomOE_1] Output Serial ROM output enable 
sRomD_h Input Serial ROM data/Rx data 
sRomclk_h Output Senal ROM clock/Tx data 

vRef Input Input reference 

eclOut_h Input Output mode selection 
perf_cnt_h 1:0 Input Performance counter inputs 
threestate_] Input Three state for testing 
icMode_h 1:0 Input Icache Test Mode Selection 
cont_l Input Continuity for testing 

clkIn_h,_1 Input Clock input 

testClkIn_h, | Input Clock input for testing 
cpuClkOut_h Output CPU clock output 
sysClkOutl_h,_1 Output System clock output, normal 
sysClkOut2_h,_1 Output System clock output, delayed 
dcOk_h Input Power and clocks ok 

reset_] Input Reset 


Packaging 


431 Pin Grid Array 
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Package Dimensions 





.461 


Information 


For more information on Digital’s 
DECchip 21064-AA Microprocessor 
' call: 


1-800-DEC-2717 
1-800-DEC-2515 TTY 
Orders may be placed through 
Digital’s Technical OEM (TOEM) 


Sales Representatives. Call your 
local Digital Sales Office for details. 
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The information in this document is subject to change without notice and should not be construed as 
a commitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no 
responsibility for any errors that may appear in this document. 


Copyright © Digital Equipment Corporation 1992 


All Rights Reserved 
Printed in U.S.A. 


-The following are trademarks of Digital Equipment Corporation: 21064, Digital, ULTRIX,.VAX, 


VMS, and the Digital logo. 
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